Metrics for Mining Multisets
نویسندگان
چکیده
A multiset (also referred to as a bag) is a set (collection of elements where the order is of no importance), where the elements do not need to be unique. A vase with n blue and m red marbles is a multiset for example. We propose a new class of distance measures (metrics) designed for multisets, both of which are a recurrent theme in many data mining [2] applications. One particular instance of this class originated from the necessity for a clustering of criminal behaviours. Here the multisets are the crimes committed in one year. This metric generalises well-known distance measures like the Jaccard and the Canberra distance. These distance measures are parameterised by a function f which, given a few simple restrictions, will always produce a valid metric. This flexibility allows these measures to be tailored for many domain-specific applications. The metrics in this class can be efficiently calculated. In the full paper, all proofs are given and various applications are shown.
منابع مشابه
General definitions for the union and intersection of ordered fuzzy multisets
Since its original formulation, the theory of fuzzy sets has spawned a number of extensions where the role of membership values in the real unit interval $[0, 1]$ is handed over to more complex mathematical entities. Amongst the many existing extensions, two similar ones, the fuzzy multisets and the hesitant fuzzy sets, rely on collections of several distinct values to represent fuzzy membershi...
متن کاملStructural Classification of XML Documents Using Multisets
In this paper, we investigate the problem of clustering XML documents based on their structure. We represent the paths in an XML document as a multiset and use the symmetric difference operation on multisets to define certain metrics. These metrics are then used to obtain a measure of similarity between any two documents in a collection. Our technique was successfully applied to real and synthe...
متن کاملLearning Rules from Very Large Databases Using Rough Multisets
This paper presents a mechanism called LERS-M for learning production rules from very large databases. It can be implemented using objectrelational database systems, it can be used for distributed data mining, and it has a structure that matches well with parallel processing. LERS-M is based on rough multisets and it is formulated using relational operations with the objective to be tightly cou...
متن کاملON GENERALIZED FUZZY MULTISETS AND THEIR USE IN COMPUTATION
An orthogonal approach to the fuzzification of both multisets and hybridsets is presented. In particular, we introduce $L$-multi-fuzzy and$L$-fuzzy hybrid sets, which are general enough and in spirit with thebasic concepts of fuzzy set theory. In addition, we study the properties ofthese structures. Also, the usefulness of these structures is examined inthe framework of mechanical multiset proc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007